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ABSTRACT 


Fli^t data froa an F-8 corsair and a Cessna 172 have been analyzed 
to demonstrate specific improvaaents in the LAC parameter extraction com- 
puter program. The Cramer-Rao bounds (diagonal terns in the dispersion 
natrix) have been shown to provide a satisfactory relative moeisure of good- 
ness of parameter estisates. It oanrot be used as an absolute measure due 
to an inheroit vsioertainty within a multiplicative factor, traced in turn 
to the uncertainty in the 'noise* baixlwidth in the statistical thecxy of 
parameter estimation. The measure is also derived on an entirely non- 
statistioal basis, yielding thereby also an interpretation of the signifi-’ 
cance of off-dir gpnal (correlation) terns in the dispersicm natrix. The 
distinction between coefficients as 'linear* and 'non-linear* is shown to 
be inpcrtant in its ixoplioaticxi to a recoianended order of parameter itera- 
tion. Techniques of improving convergence generally, have also been 
developed, and tested out on flight data. In particular, an easily im- 
planented modification incorporating a gradient search is shown to inprove 
initial estioates and thus remove a ooRiaon cause for lack of convergence. 

A close scrutiiy of the 'caxinun-likelihood* theory («hich provides the 
basis for current extraction algoritims) indicating its limitations is em 
iiiportant by-pxvxiuct of this study. A technique of 'poolii\g' has been 
developed with demonstrated improvement in processir\g nultiple maneuvers 
under similar fli^t conditions. A variety of questions that arise in 
interpreting ocmputer results are also e^licitly answered in the li^t 
of the theory developed. 
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1. SUMMATO' 


) 


Flight data from an F-8 corsair and a Cessna 172 have. bee. analyzed 
to denonstrate specific inprovements in the LRC parameter extraction com- 
puter program. The Cramer-Rao bounds (diagonal terms in the dispersion 
matrix) have been shown to provide a satisfactory relative measure of good- 
ness of parameter estimates. It cannot be used as an absolute rnetisure due 
to an inherent uncertainty within a multiplicative factor, traced in turn 
to the uncertainty in the 'noise* bandwidth in the statistical theory of 
parameter estination. The measure is also derived on an entireily non- 
statistical basis, yielding thereby also an interpretation of the signifi- 
cance of off-diagonal (correlation) terms in the dispersion iratrix. The 
distinction between coefficients as 'linear' and 'non-linear' is shown to 
be inportant in its inplication to a recomnended order parameter itera- 
tion. Techniques of improving convergence generally, have cilso been 
developed, and tested out on fli^t data. In particular, an easily im- 
plemented modification incorporating a gradient search is shown to improve 
initial estimates and thus remove a common cause for lack of convergence. 

A close scrutiny of the 'maximum-likelihood' theory (whrd^ provides the 
basis for current extraction algorithms) indicating its limitations is an 
important by-product of this study. A technique of 'pooling* has been 
developed with demonstrated improvement in processing multiple maneuvers 
under similar flight conditions. A variety of questions that arise in 
interpreting caiputer results are also explicitly answered in the li^t 
of the theory developed. 
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2. INTRODUCTION 


Parameter extraction from fli^t data has been recognized to be 
inportant for neny purposes, for instance: 

i) for comparison with wind-tunnel da i 

ii) for analytical and simulator studies for flight and handling 

qualities, and 

iii) for application to adaptive control [1]. 

While various techniques for parameter extraction have been in us^ 
for some time, it was not until the latter half of the 1960 's that digital 
ccnputer processing based on the modified Newton-Raphson algorithm mde its 
advent [2], followed by accelerated activity' along similar lines in the 
early seventies [3,4], We are now entering what nay be called the second 
phase of this effort, where we move from the nany studies indicating 
feasibility of the technique to the implementation of the program on a 
routine day-to-day basis with minimal need for supervision by a specialist. 
Before this an be accomplished, many factors have to be ironed out; the 
most important question being the development of a calculable, satisfactory 
measure of the goodness - the reliability - of the extracted parameters, 
and its interpretation. A second and conconmitant consideration is the 
developmert of a computer program that has the built-in ability to handle 
cases WTiere the 'normal' algorithm fails to produce acceptable estimates. 

In addition, this would make it possible to determine ii tiie data is so 
poor as not to warrant further processing and thus save tire time and 
effort of computation. 

This report is an attempt to improve the current Langley Research 
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Center (LRC) par'ameter extraction prognam. "Ihe stuc^ focuses on a nuiber 
oJ iuesticwis arising from the use to date of that program and provides 
.innwers within the fivarnework of two altemdte theories: one statistical 

anu other based on sensitivity. VJe begin in section 3 wiHi trie aircraft 
nodeis - the equations of motion and the type of aircraft analyzed. The 
distinction between two types of parameters 'linear' and 'na.- linear' is 
an important consideration in the basic conputational procedure for 
parameter extraction, described in section 4, viiere the procedure p'r se 
is divorced from the rationale for using it. The perfonranoe of the pro- 
cedure is then tested on ilig^it data provided by LRC, using Idle Optimi- 
zation Software (OSW) parameter extraction program. The scatter observed 
in the estimates obtained by naneu'/ers under similar fli^t conditions 
leads us to the priiraiy question of the goodness of the estinates, and is 
examined in the next two sections. The statistical theory is developed 
in section 5, culminating in the Cramer-Rao (CR) bounds providing a 
statistical measure of uncertainty. The error due to the usually accepted 
practice of taking the (two-sided) bandwidth of the noise as equal to the 
sanpling rate is explained here for the first tine. Sectiai 6 describes 
a ncn-statistical measure in terms of largest possible variation in 
the estimates for a fixed percent changie in the cost functional. 

Tne question of hcn>' co take care of c„jrrently ejqierienoed difficulties 
in parameter extracticn centering cai convergence prctolems is taken vro in 
section 7. A technique for pooling fli^t data obtained at identical 
fli^t conditions yielding estimates better than those from the individual 
runs is develc^d in secticxi 8 and its performance evaluated. Questions 
arising in the use to date of parameter extraction programs are answered 
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in section 9, based on liie thecsy developed in the previous sections. 
Concluding remaiks in section 10 sunmarize some of the specific sugges- 
ticxis for possible program iii|>rovenent. 
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3. THE ADCRAFT MOEEL 


a. I>ijation of Motion 

We begin with the nudel <• the linearized equations of airplane notion, 
latei^ node only > in state spaoe form, and associated sersor ineasuren«fits. 
Thus the state vector viiritten as a oolunn is 

X = col [e,p,r,(|»3 

The control vector is 


U = col (6^, 6^,0 ,0,1) 

The vector of sensor measurements is denoted Y: 


Y = col. [e,p,r,<^,ay] 

The continuous time c^aiamic equations relating these quantities are 

R X = AX + BU 

Y = CX + DU + EX+N 

vhere 

sin(®) -oos(o) cos(4>) cos(6) ^ 

N N 0 

P r 

1 cos(4>) tan(e) 0 
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0 

0 

0 

0 

0 


0 0 

0 0 

0 0 

0 0 

-ZAJf Xfflf 

VTTg rTTg 


0 

0 

0 

0 

0 


and W; .ere N is the *noise* and is the one really vague quantity in this 
description. As we shall see> it provides us with the 'rationale' for the 
jttx>cec>jre for our estianation technique but quantitative interpretation is 
beset with uncertainties. Generally, it is tdcen as vhite Gaussiai or 
Ge jssian of large band ocnpared to airplane response but the precise des- 
cript ion is a crucial point to which we shall return in section 5. It is 
as; umed that the noise is independent from sensor to saisor. 

Iht- parameter to be identified are the various derivatives indicated 
by a subscript. Ihe 'stability' derivatives are in the natrix A, the terms 
sina, 'Oosa, ^ oos^ oos0, cos<|) tanS are taken to be known constants. (In 
realitj; of course there may be a variation in time but this is taken to 
be small enou^ to be reglected. Actually there is no difficulty in 
accounting for the dependence on time if it is kncwn.) The 'control' 
derivatives jn the B matrix are in the first two columns. The parameters 
showii the last oolunn are 'bias' parameters and in themselves there- 
i have no physical significance and can, in particular, change even 
under identical fii^t conditions. A bias term also occurs in the natrix 
D and is described as "YBIAS" - this accounts for the average of the 
(sin (cos 6) g/v term as well as any instrument bias in the measure- 
ment. 
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b. Types of Aircraft Analyzed 

All data presented in Hus study are oonfined to two 

aircraft: A Cessna 172 and an F-8. 
The Cessna 172 flj^t tests were conducted at NASA U^. The Cessna 
is a li^t higjr wing single engine general aviaticai air plane; relevant 
physical characteristics are indicated belcw. 


CESSNA PHYSICAL CHARACTERISTICS 


Wing Area 
Span 
Chord 
Wei^t 

ix 

^2 

^XZ 


174 ft^ 

36.2 ft 
4.87 ft 
2200 lbs 
872 slug-ft^ 
1701 slug-ft^ 
14 slug-ft^ 


In the data supplied, angle of attack and side-slip measurements were 

corrected for instrument positicxi by LRC. Accelerometer offset frcan the 

C.G. was initicQ-ly stated to be negligible. In the course of the OSW data 

processing however, significant correlation between a^ and p was detected 

visually, indicating lateral accelercroeter offset (above or below C.G.). 
location 

Ihe accelerometer/ was then taken as an additional unknown parameter to be 
determined by the OSW parameter extraction program. This yielded an 
estirrate of approximately 0.9 ft below C.G., agreeing well with the 
measured position. Any parameter that can be measured independently and 
accurately, should of course be modeled as known. Thus, the measured 
position of the accelercmieter should be included, rather than neglected 
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as uniirportant. The accelercroeter location did in fact have a significant 
effect cn several of the ndnor coefficients and the r.m.s. residual fit 
on a^ was approxijnately halved by correcting for the instrument loca- 
tion. 

Ihe secOTid aircraft was a modified Navy F-8 Corsair 2. This particular 
aircraft was fitted with a sv^jer-critical wing for evaluation and was 
flown at NASA-Dryden. Physical characteristics are given below. 


F-8 PHYSICAL DATA 


Wing Area 
Span 
Chord 
Wei^t 



25.5 
13.14 M 
2.08 M 

10500 Newton 
20500 Kg-M^ 
140000 Kg-M* 
4500 Kg-M* 


The angle of attack vane in this aircraft had a significant error 
(about 1®) that was not corrected in the data. Therefore, the sina term 
in the A-iratrix was allowed to be an extra unknown. 

Data from both aircraft was transmitted on a PCM link. For the F-8 
the sanpling rate was 25 sanples a second. For the Cessna, the rate 
was only 10 samples a second, lew enough to cause concern, but proved 
adequate nevertheless because of the good quality of the data (good 
resolution, low noise and accuracy of the linear model) . 
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4. MAXIMUM LIKELIHOOD ESTIMATION PROCEDURE 
a. The Cost Functional 

The 'naximum likelihood' (it is not strictly maximm likelihood in 
statistical terms as we shall indicate in section 5 below) technique of 
estimating the parameters is to minimize the ejqjression: 

T bog d . + E i /* (Y.(t) - Y.(6;t))y4. dt (4.1) 

i-1 i=i •'o 


where 

the subindex i denotes ith ccMponent of the <±»servation vector, n 
being the size of the vector, 5 in the present case. Y(0,t) is 
the calculated c4)servation vector using approximate parameter 
values and the known input and is thus a functicsi of the para- 
meter vector 0. 

i = l,..n are ncn-negative constants, T is the avciilable 
time-history. 

The minimization is with respect to the parameter set 


col. C0,d^,..d^] 

Pending further examinaticn (sec. 5) we may accept minimizing (4.1) as 
a 'good thing' to do. Thus the second term in (4.1) is a 'mean-square 
error', 'wei^ted' by {d^}. Moreover at the 'true value* of the para- 
meters in 0, the cost functioned is a minimum. We can see this as 
follows. At a minimum, the gradient (the first partial derivatives) 
with respect to all the parameters must vanish. Thus we must have 


0 


(Y^(t) - Y^(0,t))^ dt, i = l,.n 


and 


(4.2) 
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? 


MCj) ^ 1 


30 . 



Y,(6,t) 


(Y^(t) - Y^(0,t)) dt 


Because of (4.2), the ndnimization proceeds in two successive series 
of steps : setting 

d. =i/‘^ (Y.(t) - Y.(6,t)^ dt 
1 i ./q ^ ^ 


and then minimizing the ‘cost functional* : 


q(S) = T E / 

i=l •'o 


(Y^(t) - Y^(6,x))^ dt 


with respect to 0, keeping fixed. Then calculate a new at the 
minimum ajid repeat the minimization of (4.5). Note that (4.3) has the 
minimal (ideal) zero value at the true value of the parameters e. 

Of course we do not expect to see 'exact* zero. Note also that in 
this way at the 'minimum' (4.5) reduces to the value: 


n, 

A 

again assuming we do not run across 'exact' zeros for d^. We call 

A . . . 

d^ = the mean square fit error corresponding to the ith 

measurement 

and 

n A 

L a ^ 

d^ = total mean square fit error 

1 

Similarly we shall call 

Z (t) = Y(t) - Y(0,t) 0 < t < T 

the residual, vhere 9 is the final parameter estimate. 


( 4 . 3 ) 


(4.4) 


(4.5) 


(4.6) 

(4.7) 


(4.8) 
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The oolutm vector vd.th ocnponents 




(e,t))(Y^(t) - Y^(e,t)) dt 


(4.9) 


vdiere are the con^xavents of 6, vdJ.1 be called Idie gradient (denoted 
G(6)) of q(0),and the matrix S(0) with oon^wnents 


S(0) = {S(0)jj^} 

S(6).^ = ^ t ^ r ^ 3^ at (4.10) 

will be called the 'sensitivity' matrix. It is recognized as the part of 
the Hessian of q(0) vduch is independent of the data, and is moreover 
non-negative definite. 

b. Reconmended Minimization Pracedure 

Since there are many ways of minimizing (4.5), we now describe a 
'reconmended' prooedure. [It is more than a reccaimencJation ; it will be 
closely tied in with the theory in sections 5 and 6 . ] Vhen we have no 
'good* initial values for {d^}, we take them all to be the same - not, 
in other words, favoring one measurement over another. Next we fix the 
parameters in the A matrix at their nominal starting values and minimize 
q(0) with respect to the unknown parameters in B and D - we shall refer 
to the latter parameters as 'linear' parameters since tlie'- enter 
'linearly' in Y (0,t). I-breover, derivatives of q(0) with respect to 
these parameters of order hi^er than two vanish icientically. As a result, 
the minimizing parameters are determined by: 
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H I \ 1 i I J ( 

/ 

Oi = 0g - S(0g)"^ G(0g) (4.U) 

wliere 0g is the starting value of 0 and we simply set the derivatives v/ith 
respect to the *non- linear' parameter to zero in G(0) and S(o). This is 
because we have, by virtue of the fact that derivatives of order hif^er 
than two are zero: 

G(0,) = G(0^) + S(0^) (0T - Q) = 0, by (4.11) 

X o S X S 

We wish to note now that if we rewrite (4.11) as 

S(0„) [0i-G„] = - G(0^) (4.12) 

5 X S S 

then (4.12) vdiich is equivalent to (4.11) has a solution even if 5(0^) is 
singular, and this solution, even though not unique, will still yield the 
minimizing 0^. This can be seen as follcws. Suppose for some non- zero 
vector h we have that 

S(0g)h = 0 (4.13) 

where the components of h are zero corresponding to the parameters in the 
iiHtrix A. Because the derivatives of order hi^er than two are zero, we 
have the Taylor expansion: 

q(0^ + h) = q(0^) + [G(0 ),h] + y [S(G^)h,h] (4.14) 

vhere we have used the notation 

[a,b] = Tr ab* = Tr a*b 

Since ^(0) is non-negative and 

[S(0g)h,h] = 0, 
it follcws fran (4.14) that 


I 
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TGO ),h] = 0 

S 

also and that 

q(9_ + kh) = q(e„) (4 ,5) 

s s 

for any scala r multiplier k. Let {X^} denote the non-zero eigenvalues of 
S(6g) and {e^} the corresponding orthonormalized eigenvectors, in the 
soaoe of parameter vectors vAiose oonponents are zero corresponding to 
the parameters in the matrix A. Then G(0g) can be e^q>ressed in terms of 
the {e^} as 

G(0g) - ^i 

and henoe 


SC63) - 63] = -0(63) 


has the solution 


[e. , G(0^)] 


(4.16) 


This solution is not unique since we can add any h satisfying (4.13) but 
then the corresponding value of q(.) is the same by (4.15). Hence (4.12) 
has a soluticm and any technique used to 'solve’ (4.12) should yield an 
acoeptd^le 0^. 

After this first iteration step, all parameters in 0 are allowed to 
change and algorithm (4.12) is used. Since we are no longer confined to 
linear parameters, S(0^) is no longp'^ the Hessian. Hence (4.12) is 
referred to as the 'modified' Newtun-Kaphson technique (Taylor and 
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and Iliff [2]). An element of the actual Hessian is 


S(0) 


j,.k 




r ^ 

30 ,^ ^i 


(0,t^ 


(Y^(t) 


Y^(0,t)) dt 
^4.17) 


and the cx>rrectio(n tern is small if the residual is small. The mo<^-^ 
oon^ielling reason for using only the first term is that the calcijl ■ 
of 2nd partial derivatives of ^(0,t) can be quite tedious normally, cJiQ tho 
inprovement obtained would not be significant. This is because the N-R 
technique is itself efficient cnly "close” to the true parameter values, 
so that the residual must be small to begin with. 

Note also that vhen allowing all parameters to change, the gradient 
is not necessarily zero at each step, because the 3rd and hi^er order 
derivatives are no Icnger zero. In fact the cost functional q(0) may 
not be monotone decreasing with each iteration if S(0^) has eigenvalues 
close to zero. 'Normally' however these difficulties will not appear. 

Hew to deal with them vhen they do occur will be discussed further down 
below under improving convergence in section 7. 

Illustratiofi . The normal situation is illustrated by Cessna data: 

Run 11. Here the quantities {a^} were initially set at d^^ = 9.37, d”^ = 

= 3.55, d^^ = 3.64, d~^ = 4.28, d~^ = 8300. The starting parameter values 
were: 


Cy = -.006 

C = -.1 
n 

O 

11 

cT 

o 

o 

• 

II 

J" 


P 

«a 


C, = -.001 


C = -.0013 

^a 

C - .0002 

^r 

= .00032 
C = -.5 

II 

1 

• 

C, = -.00005 

C^J = .0002 
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Tt)e behavior of q(0) by iteration is plotted in Figure 1. Ihe 
fo;;t functional at iteration 1 was obtained by varying 'linear parameters' 
tjnly from the 'starting values' at iteration 0. At the suceeding itera- 
tion, all parameters were changed at each iteration using the modified 
N.R. algorithm. It is to be noted that the most significant reduction 
in the cost functional took place in the first iteration and that the 
functional levelled off by about the third. Tables 1 show the corres- 
ponding parameter values at each iteration, Table 2 the gr-adient, and 
Table 3 the r.m.s. residual of each sensor measurenent. 

Ihe gradients of the non-linear terms were not ccsiputed for the 
first iteration, as only the linear terms were needed there. Note tne 
dranatic decrease in the gradient of the linear terms from iteration 0 
(starting values) to iteration 1. Ihis decrease is about 4-1/2 orders 
of nagnitude. These nunbers should theoretically be 0, but this decrease 
is well within eiqjected numerical accuracy in solving a 8 dimensional 
system on the IBM in single precision (the 5 bias terms are not listed). 
Overall, this case ejdribits ideal convergence characteristics. 

c. Results: Cessna 172 Fli^t Data 

Using the procedure outlined above, several runs of the Cessna 172 
data were processed. Ihe results obtained for 4 cases (2 aileron and 2 
rudder) are sunnarized in Table 4. Ihe starting values of {d^} (same 
as before) as well as the parameters were the same in all cases. The 
bias parameters are not tabulated. 

Since the flig)it conditions were close enou^ to be identical., we 
would expect that the extracted parameter values should be the same. 
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Unfortimttly this is not home out by the data, even thou^ the fit 
errors cm all the runs are about the sane and acceptably low enough. 
The most striking discrepancies are hi^ili^ted in Table 5. 

Representative tine history plots - actual and calculated - for a 
rudder run (case 10) and an aileron run (case 11) are shown in Figures 
2 and 3 respectively. Tie fits are generally good, the only evident 
abnomality b ing due to the accelerometer location problan referred 
to earlier. This correction did not however affect the parameter 
discrepancies observed. 
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5. ERROR BOUNDS; STATISTICAL TKEOPy 



a. Continucxis Time floael 


In this section we shall indicate a * statistical* approach to the basic 
question of how can we tell how good cxir detemuiaticsn of the parameters is? 
The basis of this theory rests on the assumption that the lijniting error in 
the chservation Y(t) is *random noise* . Vte begin with the ’most idealized* 
case: where we ^sume that the noise is tenperature-liroited bladc-bcxty 


radiation - that the noise is 'Gaussian vhite*. Uhfort<.mately, current fad 
in Stochastic Process theory requires us to be more pedantic as follows: 
we say 


A(t) 


=/^Y(s)ds 

•'o •'O 


V(s)ds 


+ G W(t) 


\>here W(t) is a full-rank Wiener Process and 


G = Diag 
2 

g^ corresponds to the spjecrtral density of the ith noise cxnponent, 
or, 

E . tGM(t)(QW(t))*] = diag. (gi,..gj) -- 

where 

E * stands for Expected Value, and 
V(t) = Cx + Du + ER"^(Ax + Bu) 


In this formalism, we assune that the {g^} are known. Then the 
logarithm of the likelihcxxl functicxial becomes 



Y(e,t), $(0,t)] dt 



0 


(5.1) 
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Note t^t (5.1) cannot be e)(pressed in the usual 'least squares' 
form, (cf 4.5)). Nevertheless, the gradient of (5.1) is the same, 
since tJw term that is missing is of Ihe form 


y*^ [G"^dA(t), dA(t)1 
0 

v^ich of itself is meaningless in the Wiener process formalism. But (5.1) 
is a true likelihood functicaruil and hence maximizing it would yield a 
'maximJiiHlikelihood' estinator. At Hie roaxiiiuin the gradient of (5.1) vani^es 


G(0) = I g— Y(e,t), Y(0,t)dt - dA(t)]| = 0 


(5.2) 


To solve (5.2) we use the modified N-R algorithm: 


"n+1 


0 


n 


- S(0^)“^ G(0„) 
n n 


(5.3) 


where 

S(S„) = 1/ ^ [g^ dtj (5.4) 

Vte can prove: (asymptotic consistency -theorem) 

Theorem 5.1 Suppose 

is(©)=^ (5.5) 

T -► 00 ^ 

is positive definite in an open parameter set N containing the unknown 
point 0Q. Then there exists a non-zero nei^iborhood at ^ in whic)\ (5.2) 
has a root, for all T sufficiently largie. Let ^ denote the root. Then 

E mSj -Soll^l* 0 (5.6) 

and in fact 
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si 


T ECllIp - e^ll^) Tr. 

TElUtij.) - 0^11 0 


Prcx)f For a proof see Balakrishnan [ 5 ]• 
Ihe main calculation is that 


,A 


0 = G(Qq) + S( 0 q) (Qj, - 0 ^) + hi^er order terms 


We also have the C-R bound for unbiased estijiators: 


Var. 0J, ^ S( 0 q) 


-1 


In fact for larg^ enou^ T we can use -die approxiiration: 


^ = 0Q - S(0q)“^ 6(0q) + . • • 


'.•itxeve die second tem is Gaussian with mean zero and variance 


S(0q)~^ « 


Since 0 is unknown, one usually calculates 


yA -1 

S(^) 


as an estimate of the varianoe. 



4 


(5.7) 


(5.8) 


Limitations 

The main dravA>ack in the above is the fact that the spectral density 
matrix 

diag (gp..g^) 
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is ejssuned known. But this dr«<back is only in the calculation of the 
C-R bound. Vfe can show that we may use any non-zero diagonal natrix in 
place of {g^} and still obtain asymptotic unbiasedness and asymptotic 
consistency of the parameters 0. 

Band-Lijnited Noise /^proach 

l/ie may make the more reasonable assumptiOTi that N(t) is band-limited 
Gaussian, with bandwidth large ccaipared to that of $(0,t). Moreover we 
may ther^ also consider the case where the noise power is unknown as well. 

We shall show we can then estimate the noise power as well but that un- 
certainty will again arise in the C-R bound due to lack of precise knowledge 
of the bandwidth. 

We invoke the cost functional: 


n 

2 Log 
i=l 


di^ 


n 

E 

i=l 


1 y»T (Ve,t) - Y.(t))‘ 


^i 


dt 


(5.9) 


vhich we minimize with respect to {d^} and 0. As before (see sec. 4a) we 
take: 


d^ = f (Y^(0,t) - Y.(t))^ dt (5.10) 

0 

and minimize 



Now (5.11) is not the log likelihood functional, even apart from thie fact 
is on).y an estimate for d^. Nevertheless it can yield us an a^nptot- 
iccilly unbiased and asynptotically ccyisistent estimate for 0. Denoting 
the gradient with respect to 0 of (5.11) by 
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1 rT / a A \ (Y.(e,t) - Y.(t)) dt\ 

t (4 v».«) i/ > j 

we have tJie Taylor Expansion, denoting the root by 
0 = G(0^) + S(e^) [^ - 0 q] + • • * 
vAiere S(0) is the matrix: 


S(0) 


we have liiat 


(5.12) 


(5.13) 


-1 


9r = ®o - ®'®o> ®'s„) 


where 


S<®o> = 


J tr ^i(eo.t))ni(t)dt 
1 0 


"TT" 

d. 

1 


(5.14) 


(5.15) 


n^(t) being 'Oie noise in the ith observation. Ifcw 

E[di] = f 

0 

= E [n^(t)^]dt 

= R^(0) 

vbere Rj|^(t) is the oovarianoe function of the noise process n^(t). Hence 
is an unbiased estinate of the noise- variance. Replacing d^ by its 
expectation in (5.15), we have for the variance of G(0^) 



I 

I 




I 




,t) dt, 


( ) Pi(«df I 


where P^(f) is the spectral density of the noise n^(t). If we take 

Pi(f) = g? for -B^< f < 

= 0 otherwise 

and assume B^ large, we can reduce (5.16) to: 

I? ? iI; 

vhere we note that 
2 

% - i 

■ W7 

Hence within the approximation of replacing 
by E[d^] = ^(0) 

we have that 

var.[S^ -0^] 

' lit 

v^ich if we use the reasonable approxinaticn 
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we set 




1.1 


for all i. 


var. [Gj. - Gq] 
= ^ SOq)"^ 


1 g? •'o ^®3 50^,0 


-1 


(5.17) 


Note Uiat again the use of this forroula requires knowledge of the bandwidth: 


9 A 

gt = d^/2B 


(5.18) 


Within the band-lijiiited noise assumption we can also shew idien the variance 
of the estimte 

= E ( [d^] - R^(0))^ 

R.(0)^ 

(5.19) 


b. Discrete Time Theory 

In the discrete-time theory, the data is sanpled periodically at 2B 
samples per second, and it is assumed that the noise samples are independ- 
en"^ We have then the discrete version of (4.1) 


n 

d,- + 

1 ^ 


A /i N (l.(0,kAt) - Y.(kAt))^ 

,E i E 5-^ 

i=l \ krl 1 


(5.20) 


vhere 

At = 1/2B 
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Taking the gradient yields 

^ (^i(0,kAt) - Y^(kAt))^ 

k=l 

and the parameters in 0 are detetmined so as to minimize 


(5.23) 


E 

(Y^(0,kAt) - Y^(kAt))^ ^ 


s, 1 


or use the algorithm 


-1 


Vi = ^ 


(5.22) 


v^ere 


G(0„) 


E E (j^Vs,Wt)) (?.(0,k4t) - Y.(kit)j 

s <e„) = |e I; w f 

The estinate is again asymptotically unbiased and consistent. The variance 
of Ihe estimate is 




-1 


(5.23) 


Ihe difficulty with the need to know spectral density is somadiat 
hidden in the 'discrete-time’ analysis: here the sanpling rate is 

2B 


sannples/sec and (5.13) is expressed as: 
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n 

S(6) =23 
1 

v^iich is 

n 

S(0) =E 
1 

But 

and hence this (assuming the sampling rate is adequate) yields the continuous 
time integral in (5.13). Ihe assumption is made in this that 

(g?) = d./2B 

and the error in this is that (2B) is overestimated so that g^ is under- 
estimated with the result that the C-R bound is also underestimated by a 
sizeable factor. Thus, the custorary 50 samples/sec yields a bandwidth 
of 25 Hertz v^ich in ccaiparison with the actual observed bandwidth of the 
residual can be as much as ten times the actual. Any discrete-time (or 
sampled) theory requires that the noise samples be independent sample to 
sample vdiidi is less likely the higjier the sampling rate [or assumed noise 
bandwidth] . 


1 

r 

1 


I (is: 





/ 


c. Illustration 

The F-8 data furnishes a good exanple of the inportance of the role 
played by the noise bandwidth in 1h.e calculation of the C-R bound. The 

I 

four runs - 4, 5, 20 and 21 - are at essentially the same fli^t conditions 
listed in Table 6. 

To provide a check on the calculations made with actual data, a 
simulated test was run first using the converged values from case 21 as the 
‘true’ values, and the same control input as in case 21. Four different cases 
were conputed wi1h independent noise sairples with Idle same total power as in run 
21 but wilh (tlie one-sided) bandwidths set precisely to be l/2At, At 
being the sanpling interval. The OSW extraction program was then used 
to estijiate the coefficients using each of the four simulation runs, 
yielding four independent estimates for each coefficient. The sanple 
standard deviation a calculated based on these four sanples is then 
ampared with the calculated C-R bound averaged over the four cases - 
the bounds were in fact very nearly the same on the four cases. The 
conparison is indicated in Table 7. The last column of this table lists 
the ratio of o to the C-R bound (standard deviation). [Control derivatives 
are not included since only two estinates were available for these.] We 
see that the ratio is very close to unity, the agreement being excellent 
considering the small sa’pjle size. 

The same conparison was then carried out using the F-8 flight data. 

The results are given in Table 8. The C-R bounds were ccilculated on the 
basis of a (one-sided) bandwidth of l/2At, just as in the simulated case, 
following current practice. In striking contrast to the simulated data, 
the ratio of a to C-R bound (s.d) is now roughly of the order of 10. 
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This disca?epancy is explained by our theory aB being due to the actual 
noise bandwidth being much smaller than the arhitraiy and incorrect 
specification of the bandwidth as l/2At. 

Figures 4, 5, 6, 7, and 8 show the p.s.d. of the residuals for the 
F-8 fli^t data, based on which one may assess the true (one-sided) 
bandwidth at about 1 Hertz in contrast to l/2At vMch is 12-1/2 Hertz, 

Tab?i.e 9 shews the C-R bounds calculated for the Cessna runs 9, 10, 
11 and 12 based on a noise bandwidth 1/2 At, For psd*s tne Cessna 
data see section 8, 
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6. ERROR BOIMIS; NON-STATISTICAL THEORY 


It is possible to develop an interpretation of the accuracy of the ex- 
tracted parameter values without invoking any statistical and based 

entirely on the minimizati of the functional (4.1). The functional it- 
self can be interpreted without involving any notion of noise. Thus the 
second term of the cost functional is recognized as an "output fit erier" , 
normalized by t^e wei^ting parameter {d^}. Ihe latter can reflect our 
relative degree of oonfidenoe in each of the different instruments. 

The uncertainty in the estimates may be evaluated in the follow- 

ing way: how much can we change d^ keeping the cost functional within 
a fixed percent? In other words (looking at 6 for the time being), now 
'large* can we make z: 

S -*• § + z 

keeping: (f(.) denoting the cost functional used) 

|f(6 + z)- f(6)| 4 c f(6) 

v^ere c is a fixed fraction (say 1%). Ihe question hew 'large* depends on 
the neasure we wish to choose. In gener^, it may be measured by the square 
of a linear weighted sum, more ooipactly expressed: 

[Lz,Lz] = I |Lz| 1^ 

where I., is a given rectangular matrix. Moreover, since the changes are 
snail, we may approximate the cost functional by retaining only the linear 
and quadratic terms: 

f(^ + z) = f(6) + [G(^),z] + 1/2 [S(Q)z,z] 
since d^ is fixed (for the time being) f(0) = q(0) and f(^) = n; G(^) = 0 
so that 

* "> - 1 [S(S)z,z] 

i,/R\ Zn 
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Hence we have the problem of maximizing 

t'ubject to 

[S(§)z,z3 _ ^ 

2n 

This problem is readily solved by using Lagrange mxiltiplier A and maximizinj 


[l^,Lz] - X CS(^)z,z] = [(L*L - X 3(^))z,z] 

or, the optijiBl z will satisfy: 


L*Lz = X Sf0)z 


and hence the answer to our problem is: 

2 A *"1 

max I |l2| I = 2 nc (largest eigenvalue of: S(0) L*L) 

= 2 nc (smallest eigenvalue of: (L*L)”^ S(^)) 


(6 . 1 ) 


acoordijig as either S(0) is non-singular, or (L*L) is non-singular. Also 
we nay replace in (6.1): 

S(§)"^ L*L by /iK. S(^)"^ ✓TTT: 

(L*L)"^ S(0) by ( ^ )"^ S(§) ( /[FL 


Vfe shall new consider various special cases of L: 


Case 1 

z is required to be of the form ke^ vdiere e^ is a cooixlinate vector 
and k is a scalar. In other words we perturb one particular conponent 
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variable only. 

®ii S(i) = (s— } 

Thus the smaller the diagonal elements ir* the 'sensitivity' natrix S(^), 
1he larger 1he unoertainty. The correlation in the S(^ natrix plays 
no role. 

Bvxt it docs, as soon as we ocxisicter 
Case 2 


.2 


max ) |Lz| j = 2n c 


■/ 


I 

? 

! 

I 


^■diere is a coordinate vector with 1 for the ith cc«T|»nent and zero 
elsevhere. Although the measure of unoertainty is based only on the ith 
ccmpcnent of z, all conpcnents of z are allowed to dian^. In this case 
the answer is 


HBX !|Lz|| = (2n) c 


( 6 . 2 ) 


\»here 

{d^j} = (S(^))'^ 

Note that now the answer is different from case 1 as soon as S(@) is not 
diagMial. In fact we note the elementary inequality that 

d. . ^ 2 /s. . 

11 11 

and equality holds for all i, if and CTily if S(6) is diagonal. Note 
the connection with the statistical variance measure - we have in fact 
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n[ 


f 



the ('-R Dound but vdth an arbitrary oonstant of proportionality. For a 

.:imi Itix* aK>roach, see Klein C63 and even earlier Shinbrot [7], 

liit us next ccnsider the case in which correlation in the SC©)"^ 

matrix plc^ a role. Thus t^e 
Case 3 

Hull’ = 


Let Y be the i-jth oorrelatiQn coefficient: 


Y 



3J- 33 


t 

t 


Ihen for 

Y = 0: nax MLz|| ~ 2n c nax. 

Y = 1: niax 1 1 Lz| | ^ = 2n c + d^^) 

It is interesting to note that the z that achieves the maxLinum (for y = ±1) 
has the form 


Zi=ki 


Z. = 
3 


± k . /d../d. - 

1 33 11 


In other words we may choose arbitrarily provided we also make z ^ a 
fixed multiple of z^. Such a possible linear relationship has been 
not^ in reference [83. Correlation makes the uncertainty worse. 

Finally we may ocxisider the case vhere we wei^t the z ^ conversely 
with respect to the § values : 


Case 4 


IlLzIl 


i J— 


a . 

1 


A , , 

, 0 = {a^} 


i 

£ 

I 

i 


3 ? 





I 


t 

. 


In U^ds case 


where 


2 A —X 

max l|Lz|| = 2nc Largest eigenvalue of a S(0)“'‘‘® 
= 2nc / Smallest eigenvalue of a“^ S(S) 

a = diag, [l/o^,... 1/^3. 


Table 10 shows the results for the Cessna non 9 with a 1% cost change, 

Ihe second and third columns show the calculation for cases 1 and 2; the 
mximal uncertainties are shewn as a percent of the calculated ^ value. 

The final column shews the calculation for case 4; the values of the corresponding 
optimal 2 vector are indicated as a percent of the estimated parameter values. 
These values would appear to be rather hi^ for cxily a 1% change in cost, 
and indicates in particular that run 9 was of poor quality. From C-R 
bounds given in Table 9, we see that the aileron runs are generally 
better than the rudder runs. 

We nay also stuefy the uncertainty in the estinated d^ values. The 
minimal value of (4.1) is given by : 


E log z. + n 
1 ^ 

vhere 

zT = i y (Y^(^t) - Y^(t)^ dt 
0 

And accounting for both Sand we can write vqp to second order terms: 
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f(g * 2 , S. ♦ y.) - f(S,a.) = t £ 

J. 1 i 


2 

Yi 

F 

X 


Hence 

rrax Y-^ = (2n c ) 3.^ 

1 1 

and this result is consistent with being unbiased with variance 
proportional to d^^. The uncertainty in the d^ is proportional to 
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7. CONVERGEJJCE IMPROVEMENT TECHNIQUES 



t 


The basic ccmputational problem is tiie minimization of q(0) defined 
by (4.5). Let 0 denote the starting value of the parameters, inclvKiing 
both the 'linear' and the 'non-linear', and let 0^ denote the end of the 
first iteration using the N-R algorithm vhen only the linear terms are 
allowed to vary. The full non-linear terms are allowed to vary next using 
the N-R algorithm predicated on the assunption that the Hessian given by 
(4.17) is positive definite arxi that 0^^ is "sufficiently close" to the 
true [or minimizing] value 0,p. In fact, we knew that j - 6ip! | « 
k I - ^1 1^ vhere k is a constant involving third derivatives of 
q(.). It is difficult, if not inpDssible, to determine the closeness in 
ary calculable quantitative way. On the other hand, if the closeness 
condition is not satisfied, the cost functional q(0) need not be monotone 
decreasing - in other words, q(0) oscillates and we ejq>erience lack of 
convergence. 

In practice we can determine vhether 0^ is 'close enough' by running 
the program using the N-R algorithm. If there is lack of convergence 
and this cannot be traced to other sources, we should suspect that the 
starting values were ixit close enough. In other words, when non-convergence 
is due to poor starting values, the trouble can be with the N-R algorithm 
rather than any intrinsic defect of the naximum likelihood formulation. 

The N-R algorithm ^ould always be used on the 'linear' coefficients 
as already noted. Iiipleraentaticai of this is relatively easy. After 
confutation of the gradient and the matrix S, a short rv»utine can be 
inserted to reset the 'non-linear' terms in the gradient to zero. It 
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should also set the off diagonal elements in S corresponding to these 

toiTiis also to and the diagonal terms to 1. This is not very efficient • 

In that it computes tlie non-linear terms and tiien ignores them. The 

canputational effort is not large; however, since it applies only to the 

first iteration. On the other hand, it has the advantage that it is an 

easy, ccmpact addition to any program, reducing to a one-line call to a 

short subroutine. More efficient inplementation is also possible if the 

program is well modularized. 

If the initial iteration of determiriing the linear terms is not 
adequate for conge’-vence, we nay use one of two methods to improve starting 
values: the a) gradient method, or b) the a 'priori weighing method. 

a) Ihe Gradient Technique • 

In the initial stages we nay substitute the gradient technique 
in place of the N-R technique. In the gradient iteration we proceed as 
follows : 

Vl = 0n * S(0p 

vdiere is a numerical coefficient determining the 'step size', Ihe step 
size is not unique and can be chosen in nany ways. In the 'steepest 
descent' version Yj^ is determined by ta]cing the value of Yj^ corresponding 
to 

min q(0^ + y G(0^)) 

Y 

If we c»nit terms hi^er than those of second degree in the eiqjansion of 

t 

q(0), this would yield 
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CG(6j^), 6(0j^)] 

CHn e<6„). G(0„)3 


where IL is the Hessian at © = 0 . Since the determination of involves 
n n n 

second derivatives, we itay replace by S(O^), defined by (4,11), 
Alternately, fixed step sizes nay be used (such as 0.6); or more time- 
consuming 'search* procedures nay be employed. 

Performance 

The xjse of the gradient for iidtial inprovenent will be illustrated 
by the Cessna 172 fli^t data. As starting values we use the following 
non-dimens ional matrioes, rounded off from PA-30 data: 


-.006 000 
-.001 -.5 .1 0 

.0015 -.1 -.1 0 

0 0 0 0 


B 


N 


C .001 

-.0013 .0002 

-.00004 .0002 
0 0 


0 0 0 
0 0 0 
0 0 0 
0 0 0 


The starting d^ values were: 

150 

7 

600 

25 

50000 
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None of the four cases 9, 10, 11 or 12 cxjnverged from these values 
at the first attenpt using the N-R algorithm. Case 11 was then singled 
out for detcdled study. 

For the first iteration, only the 'linear' coefficients vgere 
determined as explained above. The cost function decreased from 13750 to 
5511. 

Then the gradient iteration (explained above) was used on all the 
coefficients. The cost functional veilues were: 5511, 4867, 4387, 3937, 

3316, 2875, 2638, 2526, 2426, 2371, 2331. The coefficient values (of 
the 'non-linear' ones) at the final iteraticxi were: 

Cy = -.005992 

6 

C = -.0006647 

h 

= +.0006061 

The other non-linear coefficients showed little change. 

At this point we switched to the N-R technique (determining only the 
linear terms at the first iteration) and the corresponding cost functionals 
by iteration were: 4318, 1336 (linear only), 2634, 746, 199.6, 174, 174. 

The final coefficient values (non-linear) were: 

-.004650 0 

-.0006259 -.2376 

.0003358 -.05535 

0 0 


0 0 

.00758 

-.07167 0 

0 0 
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b. *A Priori Wei^ting' 

In this technique we modify the functional q(0) by adding a positive 
definite quadratic form: 

q'(0) = q(0) + Ck(0 - 03^), 0 - 0^1^] 


where k is ^ diagonal "default" matrix with positive entries corresponding to the 
non-linear parameters and zero otherwise. We nay interpret this as assign- 
ing an a priori Gaussian density to the non-linear parameters and taking 
the 'unconditional' log likelihood function. The effect is basically to 
keep the search for the niniinum in a chosen region. It has also the 
effect of naking the Hessian positive definite for suitably chosen K. 

After a few iterations using K, one then starts all over from the 
parameter values reached setting K to zero theiaafter. 

To demonstrate how this technique actually works out, we use the same 
Cessna case as above. A default matrix K was used: diagonal with non- 

zero entries as below: 




3 


1300 



.15 15 500 







800 



C 


5 800 


A multiplicative factor was left open. Runs were mde with factors 
1, 10, 100 and 1000 with no resulting convergence. Taking the factor 
as 100000, the following set of successive cost functionals obtained: 
13800, 5600, 5500000, 143000, 29300, 24300, 8900. 

The monotone behavior after the second was encouraging. In nerms of the 
corresponding parameter values, C overshot to negative values and 

"b 
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then regained slowly over the raonotonic decreasing portion. The corres- 


[onding actual 


6 


values were 


.0015 .0015 -.000805 

-.0006847 -.0007170 -.0002629 


n 

The multiplying factor was then increased to 10 . The corresponding 
cost functional values became 

13800 7800 62000 8160 34200 1347 417 


The noticeable feature here is the smallness of the final cost functional, 

significantly lower than the starting value. The coefficient value at 

the last iteration indicated that C_ was the only non-linear coefficient 

changing significantly from the starting value. Hence, the starting 

value of C was chanz,- d from .0015 to .00032, the final value obtained. 

B 

The cost functional then behaved satisfactorily: 

3225 433 449 200 174 174 174 


Moreover the coefficients obtained were identical to those obtained with 
thie gradient technique. 

It can be seen that the a priori wei^ting technique is very ad hoc 
in nature and is subject to the criticism that it is "massaging the data 
to get the answer you want." 


8, POOLING TECHNIQUE 


a. Theory 

When we have a set of runs all at or near the same fli^t conditions, 
we should, from the statistical noise theory point of view, ’pool* them 
in the following way: (as opposed to averaging the parconeter estimates 

chtained independently from each run). Thus let us nurober the 
time-histories Y^(t), i = l,..n< wjier« the initial time is normalized to 
zero in each case so that we have ra cbservation (vectors); 

Y^(t), 0 4 t < T^, i = l,...m. 

The main assuiiption is that the noise in each of the rui^s is statistically 
independent from run to run - this assunpticn is statistified if there is 
a time difference of a few seconds between the end of one and the beginning 
of the next - the runs are in practice obtained 'sequentially' in time 
anyw^. The cost functional to be minimized is 


n 

s 

log d^ 



n 

- L 

i=l 

m / m \.l / 

fi (? '<) f. 

/a * ' 

T. lY.^(0,t) - Y.^(t)) 

3 1-i i L. dt 

d. 

1 

(8.1) 


vhere Y^^(0,t) is the calculated response for fixed parameters. 

The form of (8.1) is derived frcm the fact the conditional probability: 


P [Y^,Y^,..Y^ |0] 

= P [Y^l 0].. P [Y^l 0] 
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H 


vSiich is turn based on the independence of the noise processes from 
run to run. The noise variances frcm run to run are of course taken to 
be the sane. The bias terTHS and the initial conditions are allowed to be 
dependent cn the run. Only the aircraft stability and control derivatives 
are fixed for all the runs. 

We have: 




(f ^.(0,t) 


dt 


m 

? 


( 8 . 2 ) 


and we minimize for fixed {d^}, 


n m , /*T. / A • 

q(0) f ’((V 

1 1 Et. 0 V 


(0,t) - dt (8.3) 


with respect to all the other unknown parameters. We have again the 
modified N-R algorithm: 


Q.,4.1 = - S(0„) 

n+1 n n 


-1 


G(0 ) 
n 


virere 


0 is of course a vector of Idle form: 


0 = col 


[ct£» Otj^] 


j = l,..m 
i = l,..p, say 
k = 1, ...r, say 


42 



The double-indexed parameters are the bias tenns and initial conditions 
which enter linearly and are allowed to vary from run to run. Tlie gradient 
G(0) is the column vecrtor of partial derivatives; 


f 

i 



v^ere a, B stand for the parameters. 

The main question that remains is the C-R bound for the aircraft 
parameters. We have the expansion, 0^ denoting the true total parameter 
vecrtor set; 

§ - 0 = - S(0 )'^ G(0 ) (8.6) 

o o o 

from vhich we c:alculate that 


E ((^0 ) (^0 )*) 

o o 



i 



I 

I 


where 

2 *1 
(g^) = spectral density of N^-^(t) = d^/2B^ 

This is a little complicated and can be simplified further ij' we assume 
that the bandwidths are the sane: 


= B 


and of course the variances are the sane: 


E [N.3(t)2] = d. 


In that case 


E [d.] = d. 


Hence replacing d^ by d^ in the fonnula (8.5) for S(0), the expression (8.7) 
simplifies to 


S(V'" { s E 

'll 


n m 






/ 


S(0 ) 
o 


-1 


-It t 4 /^ 


-1 


1 1 gi •'o 


( 8 . 8 ) 


v^ere we emphasize that 


(2B) g^^ = d^ 
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The "pooled” C.R, bound is thus inverse of 



w:iich is also the inverse of 
m 

= E S, (8.9) 

1 

where Sj are the individual sensitivity natrioes. Note that if we tock as 
the cxnposite the average of the m deteiminations , the variance would be 



( 8 . 10 ) 


and of course this variance would oe larger: 



the improvement being bigger, the farther apart the individual matrices 
are. Note ali.o that there is the advantage that (8.9) /Till tend :o be 
tcre non-singular than the individual matrices. 

b. Performance 

Perfoimance of this technique was tested on the Cessna 172 flight data. 
The rudder input cases (9 and 10 ) yield poorer results than tiie aileron 
cases (11 and 12). Sijace the fli^t conditions are very close, it seems 
latux'al to pool the rudder and aileron cases. Thus, run 9 was pooled with 
:”jn 11 (and designated 9-11) and run 10 wcis pooled with run 12 (designated 
iO-12). Ihe estiJiBtes of Ihe coefficients and the C-R bounds are presented 
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ill Table 11. Ihe fits are shown in BLgures 9-12 and the residuals in 
Figures 13-16. We note first tliat there is a significant reduction in 
the scatter of the estimates fTtw. the individual runs. Ihe agreement 
between the two pooled runs is also quite good. The C-R bouTtds are 
sli^tly better tiian for the individual aileron rens and significantly 
better than the individual nxJder runs. These bounds were ccdculated 
c« the basis of a (one-sided) bandwidth of l/2At (5 Hertz). 

Figures 17-21 ^ow the power spectral density of the residuals. It 
is based cxi Ihe first 512 time point of the coirbined residuals from all 
four cases. Based on these plots, a bandwidth of around 1 Ifertz or 
less would be reasonable. This would indicate •diat "tiie C-R bounds should 
be 2 or 3 times larger at least. The resulting error estijiates are 
consistent with the estimate scatter. 


9. QUESTIONS AND ANSWERS 


In this section we shall provide explicit answers to the questions 
raised by LRC in the Statement of Wcrk. The theory leading to the answers 
has been presented in sections 4 throu^ 8 in this report and will be 
dz^wn upon as needed. 

Question 1 - Does non-uniqueness of derivatives always occur with 
hig^ correlation? 

Answer - With S(0) as defined in this report, one may refer to hi^ 
correlation in S(0) or, as LRC suggests, in its inverse - the dispersion 
iiBtrix. Uncertainty in derivative extraction can be interpreted either 
with the statistical theory (section 4) or the non-statistical theory 
(section 5). In either case we have shown that the diagonal elonents in 
the dispersion natrix provide a direct n^sure of this uncertainty and is 
hi^er, the higher the correlation in the S(0) matrix. Also, correlation 
in S(0)~^ has an additional effect of increasing this uncertainty. In this 
sense, hi^ correlation is indicative of non-uniqueness of derivatives ; 
see under case (3) of section 6. 

Question 2 - Can we tell which parameters are ’observable'? a) Vftiat 
statistics or vihat part of the program can we interrogate to find out? 
b) Why won't the variances tell us if a parameter is not observable? 

Answer - The obser</ability of a parameter is measured by the corres- 
ponding diagonal entry in the dispersion natrix S(0)~^. The interpretation 
of this can be statistical or non-statistical. However, there is un- 
certainty in this within a multiplicative factor owing to the uncertainty 
in the noise bandwidth (see section 4). Hence, an 'absolute' 
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(as (^^sed to relative) quantitative use of the variance can be ndslead- 
ia£ (and may be inccff^sistent with observed scatter). 

Question 3 - When two parameters are hi^U.y correlated, we can vary 
either parameter over a wide range, provided we condensate by dianging the 
other parameter. Since we can vary the parameters in "tiiis nanner, vrfiy do 
we ccaiverge cxn one value for each parameter, and still have indicated low 
variance for each parameter? 

Answer - By 'parameter correlation' vdat is meant here is the correla- 
tion in the dispersion matrix S“^. We have shown in section 5 that when 
such oorrelaticn occurs we nay change either parameter linearly with 
respect to the other with little or no change in the cost functional being 
minimized. However, how much the parameter can be changed in this nanner 
is still determined by the sum of the corresponding variances, as indi- 
cated in section 5. 

Question 4 - Under what ccnditicais do correlations of parameters 
occur? Is it caused by correlations of states? Is there sonae other 
reason? 

Answer - Again, by 'correlation of parameters' is meait the correlation 
jji the S~^ matrix. In the statistical theory, this correlation is actually 
the correlation in the error covariance and not in the parameters than^ 
selves. The parameters are itot conceived as random variables. In any 
event, the correlaticxi in the dispersi<x\ matrix is not due to correlation 
in the states, and does not have any direct interpretation other than as 
indicating 'stiffness' in the dispersion natrix (large eigenvalue spread). 

On the other hand, correlation in the matrix S(0) would indicate closeness 
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to singiilarity and hence largeness of the C-R bounds. Also, the 'two-by- 
two* correlation by itself need not be large and yet sub-determinants nay 
be zero or close to zero. Hence, siiigling out two-by-two correlations for 
attention does not appear- to be of much direct relevance. 

Question 5 - What does 'correlation' between parameters and/or states 
really mean? 

Answer - Correlation between states is apparently interpreted as in 
the follcwing type of situatioh : 

p = kg 

(the time history p(t) is a constant multiple of g(t)). More generally, 
if x(t) represents the state, then we may consider 

[v,x(t)] =0 0 < t < T 

for some non-zero vector v. Presumably this is satisfied at the true 
parameter values. But this has nothing to do with the dispersion matrix 
since the latter' is determined by partial derivatives with respect to the 
parameters. In other words, it does not follow, for exarple, that the 
sensitivity matrix S(G) is singular. 


Question 6 - If conrelation between C„ and C_ is 1:1, wall a,"-. 

n^ rij, 

incremental change in produce the same effect as an identical incre- 

3 

mental change in C ? If correlation between tw?o derivatives is less 

r 

than 1.0, what does it mean? 
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Answer - Continuing to interpret correlation as again that in the 


dispersion matrix, it is clear first of all that correlation of +1 cannot 
cxx:ur, since othervise the matrix would be singular. On the other hand, 
if the correlation is merely "close to one" then the implication is only 
that the matrix is 'stiff. If the correlation in the S natrix is taken, 
however, occurrence of exactly + 1 would mean that it has a zero eigenvalue 
and hence if is the optinal estate, 

S(S^) = S(^ + \e) 

where e is the eigenvector corresponding to the zero eigenvalue and hence 
also 

q(6‘^) = q(^ + Ae) 

to the second order approxination. Thus, if the 'correlation between 

C and C is +1' is interpreted to mean that the corresponding correla- 

te tp 

tion in the S natrix is 1, then the eigenvector e has zero entries except 

corresponding to the C , C places where it is — , - — and we may 

tg y/2 

keep the cost functional the same by proportionately changing C and 

. This statement continues to be approximately true if the ccrrelatio 

sufficiently close to +. 1. 

Question 7 - Is it possible to specify fligj^t-test techniques that 
will minimize correlations and maximize observability? Is there any 
analytical basis for determining best surface to use, and best control- 
input time history to minimize correlations? Will control to minimize 
correlations also naximize sensitivity parameters, or will minimizing 
correlations also minimize state sensitivity to the parameters. 
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Answer - It is quite possil>le to require that the input u( . ) be such 
as to nake the correlations in the S matrix or the dispersion matrix to be 
smll. However, the only real analytical basis for determining the basic 
surface to use will be to require that the trace of the dispersion natrix 
be minimized. The optimal solution corresponding to this criterion will 
have a smaller trace than the case where the correlations in the S natrix 
are zero. 

Minimizing correlations in the dispersion natrix will mean very 
little if the trace is unaffected - see secticxi 5, case 3. 
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10. CXaJCUJDING RQttRKS 


By way of c»nclusion, some of the specific reconinendations for 
ijUjroving current parameter extraction programs will new be itemized. 

1. Using approximate noise variances vAiere available or otherwise 
naking them all the same, Uie first iteraticxi should vary only Ifie 'linear' 
coefficients. Using the residuals as estimates for the noise variances, 
all the parameters are allowed to vary from then on, until convergence is 
obtained. The new set of residuals is then used to repeat the above 
procedure xaitil the residvials stabilize. 

2. If convergence difficulties arise - or even otherwise routinely - 
after the 'linear' first iteration is completed, it is recaimended that 
the gradient technique described be employed until the gradient stabilizes 
and then the switch to the N-R algorithm be mde. The a priori wei^ting 
technique is too subjective and ad hoc and is not recomnended. 

3. The proper measure of uncertainty or observability of the 
parameters is provided ty the diagonal terms in the dispersion matrix. 
However, there is some danger in using this as an "absolute” measure 
rather than a 'relative* measure because it will always contain an unceirtrain 
multiplicative factor. 

4. The p.s.d's of the residuals may be used to estimate actual 
noise bandwidth. 

5. Where multiple maneuvers at identical or similar fli^t condi- 
tions are available, the 'pooling technique' should be used in contrast 
to averaging the estimates frcm the individual maneuvers. It is 
particularly helpful to pair the aileron-input data with the rudder-input 



data so as to ijTprove the estimates since the latter generally turn out to be 
worse. [A stucfy of this phenomenon, verifying vhether the aileron input 
alwrjy;; yields better results than the rudder input and if so, v^at the 
reasons are, should be of value - and would shed nuch li^t on the 'optinal 
input’ problem.] 

6. Caution is necessary in using the dispersion matrix at the end 
of the first ’linear' iteration as a measure of the data since the natrix 
may well be noni-singular and acceptable at the starting values and yet 
singular at tiie true values. 

7. Minimizing correlations in Idle dispersion matrix is of little 
value - minimizing its trace is more meaningful. 
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m RESIDUALS BY ITERATION: CESSNA #11 


3 

2.2A 

.52A 

.516 

.354 

.350 

.350 

.350 

P 

2.89 

i.n 


.562 

.563 

.561 

.561 

r 

1.85 

.868 

.825 

.582 

.584 

.585 

:585 


3.A9 


.559 

.502 

.496 

.497 

.497 


.0333 

.01A8 

.0108 

.0U2 

.om 

.0110 

.0110 
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Table 3 
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Start 

-.006 


-.001 


.00032 


»V I 


-.0013 


-.00005 


.0002 


.0002 







F8 FLIGHT CONDITIONS 


Run 

fkCH 

Alpha 

4 

.806 

3.83 

5 

.788 

4.01 

20 

.793 

4.48 

23. 

.796 

3.89 


Q 

V Input 

211.1 

779.2 

200.3 


177.. 

5 

a 

177.9 
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Tabu 6 
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54.8 

53.3 
51.6 

557.7 

20.5 

12.0 

1.93 

57.39 

9.32 

1.89 

24.4 
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